Combining Document Representations for Prior-art Retrieval

نویسندگان

  • Eva D'hondt
  • Suzan Verberne
  • Wouter Alink
  • Roberto Cornacchia
چکیده

In this paper we report on our participation in the CLEF-IP 2011 prior art retrieval task. We investigated whether adding syntactic information in the form of dependency triples to a bag-of-words representation could lead to improvements in patent retrieval. In our experiments, we investigated this effect on the title, abstract and first 400 words of the description section. The experiments were conducted in the Spinque framework with which we tried to optimize for the combinations of text representation and document sections. We found that adding triples did not improve overall MAP scores, compared to the baseline bag-of-words approach but does result in slightly higher set recall scores. In future work we will extend our experiments to use all the text sections of the patent documents and fine-tune the mixture weights.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

Probabilistic Document Length Priors for Language Models

This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in a probabilistic fashion and portrays a novel way of considering document length. Furthermore, we developed a new way of combining document length priors with the query likelihood estimation based on the risk of accep...

متن کامل

Prior Art Search and Its Evaluation

Prior Art Search is an information seeking task where searchers, for instance patent examiners, search for published literature to determine whether the claimed invention in a patent application is novel. In Prior Art Search, search tasks are often timesensitive and consist of rich information needs with multiple aspects/subtopics. In this thesis, we explore information retrieval techniques and...

متن کامل

Towards robust methods for spoken document retrieval

In this paper, we investigate a number of robust indexing and retrieval methods in an effort to improve spoken document retrieval performance in the presence of speech recognition errors. In particular, we examine expanding the original query representation to include confusible terms; developing a new document-query retrieval measure based on approximate matching that is less sensitive to reco...

متن کامل

Prior Art Retrieval Using Various Patent Document Fields Contents

In this paper, we report our approach to retrieve patent documents based on the prior art. We use the standard Information Retrieval (IR) techniques which contain indexing and retrieval processes. We use various combinations of document fields for the query formulation. Based on the evaluation summary, we achieve the best result for the combinations of invention-title, description and claims fi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011